AITopics | moral scenario

Collaborating Authors

moral scenario

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Unveiling the Bias Impact on Symmetric Moral Consistency of Large Language Models

Neural Information Processing SystemsMar-20-2026, 06:14:11 GMT

Large Language Models (LLMs) have demonstrated remarkable capabilities, surpassing human experts in various benchmark tests and playing a vital role in various industry sectors. Despite their effectiveness, a notable drawback of LLMs is their inconsistent moral behavior, which raises ethical concerns. This work delves into symmetric moral consistency in large language models and demonstrates that modern LLMs lack sufficient consistency ability in moral scenarios. Our extensive investigation of twelve popular LLMs reveals that their assessed consistency scores are influenced by position bias and selection bias rather than their intrinsic abilities. We propose a new framework tSMC, which gauges the effects of these biases and effectively mitigates the bias impact based on the Kullback-Leibler divergence to pinpoint LLMs' mitigated Symmetric Moral Consistency. We find that the ability of LLMs to maintain consistency varies across different moral scenarios. Specifically, LLMs show more consistency in scenarios with clear moral answers compared to those where no choice is morally perfect.

artificial intelligence, large language model, natural language, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Appendix Contents

Neural Information Processing SystemsFeb-16-2026, 06:35:38 GMT

Every moral scenario consists of a triple ( context, action 1, action 2) and a set of auxiliary labels. The actions describe two possible actions in the first-person (e.g., The moral scenarios can be categorized into: 1. MoralChoice-LowAmbiguity The LLM-assisted construction (i.e., zero-and few-shot prompting setups) of the scenarios is grounded Category Rule Refined Rule Description Do not harm Do not kill Do not kill (i.e., do not cause permanent loss of consciousness). Do not cause pain Do not cause physical or emotional pain or unpleasant feelings (e.g., anger, sadness) to someone. Do not disable Do not deprive someone of their physical, mental or volitional ability (e.g. Do not deprive of freedom Do not deprive someone of their freedom (i.e., make a person unable to do something by altering the person's environment or situation).

large language model, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Country:

Oceania > New Zealand (0.04)
Oceania > Australia (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > Canada (0.04)

Genre: Questionnaire & Opinion Survey (0.69)

Industry: Health & Medicine > Therapeutic Area (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

a2cf225ba392627529efef14dc857e22-Paper-Conference.pdf

Neural Information Processing SystemsFeb-16-2026, 06:35:35 GMT

large language model, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

Oceania > New Zealand (0.04)
Oceania > Australia (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
(4 more...)

Genre:

Questionnaire & Opinion Survey (1.00)
Research Report > New Finding (0.46)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.67)

Add feedback

Unveiling the Bias Impact on Symmetric Moral Consistency of Large Language Models

Neural Information Processing SystemsFeb-12-2026, 10:19:00 GMT

We find that the ability of LLMs to maintain consistency varies across different moral scenarios.

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

Asia > China > Hong Kong (0.04)
Asia > China > Guangdong Province (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Evaluating the Moral Beliefs Encoded in LLMs

Neural Information Processing SystemsDec-26-2025, 11:32:55 GMT

This paper presents a case study on the design, administration, post-processing, and evaluation of surveys on large language models (LLMs). It comprises two components:(1) A statistical method for eliciting beliefs encoded in LLMs. We introduce statistical measures and evaluation metrics that quantify the probability of an LLM making a choice, the associated uncertainty, and the consistency of that choice.(2) We apply this method to study what moral beliefs are encoded in different LLMs, especially in ambiguous cases where the right choice is not obvious.We design a large-scale survey comprising 680 high-ambiguity moral scenarios (e.g., Should I tell a white lie?) and 687 low-ambiguity moral scenarios (e.g., Should I stop for a pedestrian on the road?). Each scenario includes a description, two possible actions, and auxiliary labels indicating violated rules (e.g., do not kill).

moral belief encoded, name change, scenario, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Analysing Moral Bias in Finetuned LLMs through Mechanistic Interpretability

Raimondi, Bianca, Dalbagno, Daniela, Gabbrielli, Maurizio

arXiv.org Artificial IntelligenceDec-8-2025

Large language models (LLMs) have been shown to internalize human-like biases during finetuning, yet the mechanisms by which these biases manifest remain unclear. In this work, we investigated whether the well-known Knobe effect, a moral bias in intentionality judgements, emerges in finetuned LLMs and whether it can be traced back to specific components of the model. We conducted a Layer-Patching analysis across 3 open-weights LLMs and demonstrated that the bias is not only learned during finetuning but also localized in a specific set of layers. Surprisingly, we found that patching activations from the corresponding pretrained model into just a few critical layers is sufficient to eliminate the effect. Our findings offer new evidence that social biases in LLMs can be interpreted, localized, and mitigated through targeted interventions, without the need for model retraining.

artificial intelligence, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

2510.12229

Country:

Europe > Italy (0.14)
North America > United States (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.68)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

MoralReason: Generalizable Moral Decision Alignment For LLM Agents Using Reasoning-Level Reinforcement Learning

An, Zhiyu, Du, Wan

arXiv.org Artificial IntelligenceNov-18-2025

Large language models are increasingly influencing human moral decisions, yet current approaches focus primarily on evaluating rather than actively steering their moral decisions. We formulate this as an out-of-distribution moral alignment problem, where LLM agents must learn to apply consistent moral reasoning frameworks to scenarios beyond their training distribution. We introduce Moral-Reason-QA, a novel dataset extending 680 human-annotated, high-ambiguity moral scenarios with framework-specific reasoning traces across utilitarian, deontological, and virtue ethics, enabling systematic evaluation of moral generalization in realistic decision contexts. Our learning approach employs Group Relative Policy Optimization with composite rewards that simultaneously optimize decision alignment and framework-specific reasoning processes to facilitate learning of the underlying moral frameworks. Experimental results demonstrate successful generalization to unseen moral scenarios, with softmax-normalized alignment scores improving by +0.757 for utilitarian and +0.450 for deontological frameworks when tested on out-of-distribution evaluation sets. The experiments also reveal training challenges and promising directions that inform future research. These findings establish that LLM agents can be systematically trained to internalize and apply specific moral frameworks to novel situations, providing a critical foundation for AI safety as language models become more integrated into human decision-making processes.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2511.12271

Genre: Research Report > New Finding (0.89)

Industry: Law (0.46)

Technology: